Background
Clinical trial data are a critical, valuable, difficult to access source of information about new medicines. Privacy & intellectual property concerns limit the availability of trial data for secondary, exploratory analysis. Generative models trained on historical clinical trial data can produce synthetic datasets that preserve patient privacy while maintaining dataset characteristics. This allows analysts access to clinical insights without compromising privacy. Uses of these data include to design clinical trials, to generate evidence to support subgroup analyses or matched external controls and to augment trial data sources for machine learning
We report how synthetic data from CAR T trials is used to model safety & efficacy in a trial setting with special focus on analyses to design a mitigation strategy for prolonged leukopenia following CAR T infusion
Methods
Synthetic Data Generation
Synthetic data was generated using Simulants, an algorithm that creates synthetic patients by permuting features among similar patients. This approach significantly outperforms deep-learning approaches for clinical trial data. Patient-level data was synthesized from multiple, completed clinical trials from the Medidata Clinical Cloud
Clinically relevant analysis planning
In collaboration with academic & industry experts, we translated analytical insights into practical advancements in treatment safety and efficacy. Applications include designing lymphodepletion strategies to optimize CAR-T therapy efficacy, predicting severe Cytokine Release Syndrome and analyzing co-occurrence patterns of CRS and ICANS
We focused on prolonged leukopenia and immune-system recovery following CAR T therapy because this is a common side effect associated with life threatening infections that is difficult to study without access to the patient-level data. While information on adverse events, such as infections, or point-in-time reports of abnormal blood cytology may be reported in published trial reports as graded, adverse events, blood test dynamics are notAnalytical approaches
Analysis methods included
Descriptive analyses on temporal evolution of blood analytes over time
Risk factor analysis using models to find factors predictive of prolonged leukopenia or that impact leukocyte recovery
Intervention modeling through analysis of treatment approaches observed in the trial data, such as use of Granulocyte-Colony Stimulating Factor
ResultsSynthetic Data Fidelity and Privacy
Synthetic data demonstrated high fidelity to datasets (Silhouette score: -0.083, Bag of words R²: 0.99), ensuring the reliability of downstream analyses while protecting individual privacy (Membership disclosure AUC ROC: 0.62).Clinically meaningful insights
Distinct patterns in leukocyte dynamics were observed in patients who exhibited prolonged leukopenia as compared to those who recovered quickly. While leukocyte counts initially dropped sharply in all recovery groups post-infusion, patients who recovered showed a consistent increase in leukocytes, while patients who did not recover fluctuated below the leukopenia threshold. Patients with partial recovery plateaued around Day 50 with minimal late recovery
Elevated ferritin post-infusion and pre-infusion leukocyte counts were identified as significant predictors of prolonged leukopenia
Early Granulocyte-Colony Stimulating Factor (G-CSF) administration within the first 30 days of CAR T therapy was associated with less long-term leukopenia, highlighting the potential benefit of early interventionCross validation of findings in synthetic data with source data
The same analysis was conducted in parallel on the real data. We observed qualitatively similar results, which reinforces the reliability of the synthetic data for analysis. Further, no statistically significant differences were observed in any key findings between the source & synthetic data.
Conclusion
Synthetic clinical trial data can be used in place of source data to develop clinically meaningful insights for CAR T patient management. As synthetic data carries many fewer risks for the privacy of patients or sponsors, we demonstrate the feasibility of this approach to significantly enhance the availability of clinical trial data & accelerate the discovery of new therapeutic approaches
Lafeuille:Medidata, a Dassault Systèmes company: Current Employment. Shafquat:Medidata, a Dassault Systèmes company: Current Employment. Sang:Medidata, a Dassault Systèmes company: Current Employment. Beigi:Medidata, a Dassault Systèmes company: Current Employment. Maura:Sanofi: Consultancy, Honoraria; Medidata: Consultancy, Honoraria. Aptekar:Medidata, a Dassault Systèmes company: Current Employment.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal